fix: Duplicate LLM calls for the Agents SDK #984

dot-agi · 2025-05-16T22:40:26Z

📥 Pull Request

📘 Description
Sets the context in the OpenAI instrumentor from the OpenAI Agents SDK instrumentor.
Closes #974

🧪 Testing
Tested with the customer service agent example and the demo repo.

Copilot

Pull Request Overview

This pull request fixes duplicate LLM calls for the Agents SDK by adding special handling for "ResponseSpanData" and integrating custom wrappers into the OpenAI instrumentation. Key changes include:

Adding context propagation for ResponseSpanData in exporter.py.
Wrapping the Responses API calls in instrumentor.py with custom wrappers to leverage the Agents SDK trace context.
Unwrapping the custom wrappers when uninstrumenting, with enhanced debug logging in both files.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
agentops/instrumentation/openai_agents/exporter.py	Adds special handling for ResponseSpanData to propagate trace context
agentops/instrumentation/openai/instrumentor.py	Introduces custom wrappers for both synchronous and asynchronous responses and handles unwrapping for the custom instrumentation

Comments suppressed due to low confidence (1)

agentops/instrumentation/openai_agents/exporter.py:319

The variable 'span_id' is used without being defined. Please ensure 'span_id' is properly assigned before this line or update the reference if a different variable should be used.

ctx = context_api.set_value("openai_agents.span_id", span_id, ctx)

codecov · 2025-05-16T23:16:27Z

Codecov Report

Attention: Patch coverage is 58.92857% with 46 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
agentops/instrumentation/openai/instrumentor.py	62.50%	36 Missing ⚠️
agentops/instrumentation/openai_agents/exporter.py	33.33%	10 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copilot

Pull Request Overview

This PR fixes duplicate LLM calls by setting the context in the OpenAI instrumentor using information from the OpenAI Agents SDK. Key changes include adding tests to verify context propagation in custom wrappers, propagating trace context within the exporter for ResponseSpanData, and updating the instrumentor to use custom wrappers for responses.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
tests/unit/instrumentation/openai_core/test_custom_wrappers.py	Added unit tests to ensure the custom responses wrappers correctly set OpenAI Agents SDK context.
agentops/instrumentation/openai_agents/exporter.py	Introduced trace context propagation for ResponseSpanData to prevent duplicate spans and calls.
agentops/instrumentation/openai/instrumentor.py	Updated the instrumentor to wrap and unwrap responses using custom wrappers with added logging.

Comments suppressed due to low confidence (1)

agentops/instrumentation/openai_agents/exporter.py:319

The variable 'span_id' is used here but not defined in this scope. Ensure 'span_id' is properly retrieved or assigned before being used.

ctx = context_api.set_value("openai_agents.span_id", span_id, ctx)

areibman · 2025-05-17T22:03:35Z

Still seeing duplicates in the tool_example notebook

Copilot

Pull Request Overview

This PR fixes duplicate LLM calls in the Agents SDK by updating how the OpenAI responses instrumentation is applied. Key changes include:

Switching from the standard wrap/unwrap to using custom wrappers via wrap_function_wrapper.
Updating and extending tests for both synchronous and asynchronous response instrumentation.
Adding special handling in the OpenAI Agents exporter to propagate trace context from response spans.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
tests/unit/instrumentation/openai_core/test_instrumentor.py	Updated tests to verify the use of custom wrappers with wrap_function_wrapper
tests/unit/instrumentation/openai_core/test_custom_wrappers.py	Added tests to ensure custom wrappers correctly handle context and span attributes
agentops/instrumentation/openai_agents/exporter.py	Added logic for processing ResponseSpanData and propagating trace context
agentops/instrumentation/openai/instrumentor.py	Updated _instrument and _uninstrument to use custom wrappers and provide fallback logging

Comments suppressed due to low confidence (1)

agentops/instrumentation/openai_agents/exporter.py:319

The variable 'span_id' is used without being defined. Consider retrieving the span identifier from the span (similarly to how 'trace_id' and 'parent_id' are obtained) to ensure proper context propagation.

ctx = context_api.set_value("openai_agents.span_id", span_id, ctx)

tests/unit/instrumentation/openai_core/test_instrumentor.py

areibman · 2025-05-20T06:53:41Z

@the-praxs The original ticket also mentions tool calls. Please either do that in this branch or make a new ticket :)

As for the review-- this is fairly complicated. @Dwij1704 has some more insights on the right method here.

dot-agi · 2025-05-20T06:57:59Z

@the-praxs The original ticket also mentions tool calls. Please either do that in this branch or make a new ticket :)

As for the review-- this is fairly complicated. @Dwij1704 has some more insights on the right method here.

Will do it here

areibman · 2025-05-20T08:04:53Z

@the-praxs Another major issue here is that responses don't seem to be properly instrumented. Run any example OpenAI Agents and you'll see. Agents uses the Responses API, and for whatever reason, we don't catch prompts. Here's an example JSON we get in the AgentOps dashboard. Notice there's no prompt (user or system), only a completion.

{
  "span_id": "99f46cd24e9df5ea",
  "parent_span_id": "cdcfbaa2b5798726",
  "span_name": "openai.responses.create",
  "span_kind": "Client",
  "span_type": "request",
  "service_name": "agentops",
  "start_time": "2025-05-11T04:26:18.965974",
  "end_time": "2025-05-11T04:26:38.825378",
  "duration": 19859404000,
  "status_code": "OK",
  "status_message": "",
  "attributes": {},
  "resource_attributes": {
    "ProjectId": "890e4ebb-88f1-46cd-844c-c5a995b3eab7",
    "agentops.project.id": "890e4ebb-88f1-46cd-844c-c5a995b3eab7",
    "cpu.count": "10",
    "cpu.percent": "22.3",
    "host.machine": "arm64",
    "host.name": "f3edd163dca2",
    "host.node": "macbookpro.lan",
    "host.os_release": "24.0.0",
    "host.processor": "arm",
    "host.system": "Darwin",
    "host.version": "Darwin Kernel Version 24.0.0: Tue Sep 24 23:39:07 PDT 2024; root:xnu-11215.1.12~1/RELEASE_ARM64_T6000",
    "imported_libraries": "[\"openai\",\"json\",\"agentops\",\"asyncio\"]",
    "memory.available": "4799184896",
    "memory.percent": "72.1",
    "memory.total": "17179869184",
    "memory.used": "6705627136",
    "os.type": "linux",
    "service.name": "agentops"
  },
  "span_attributes": {
    "gen_ai": {
      "completion": [
        {
          "0": {
            "content": "I’m locking Tauros into Choice Band Close Combat.\n\nWith 225 Atk and 1.5× Choice Band boost, Close Combat’s 120 BP STAB + 2× against Ogerpon’s Grass/Fire typing will reliably OHKO without ever missing—whereas Stone Edge, while stronger on paper, is 80 accuracy and a miss would let Ogerpon fire back first.",
            "id": "rs_682026ebd87481919b5840f896eb180709a776b3e68b5a1d",
            "type": "output_text"
          }
        },
        {
          "1": {
            "finish_reason": "completed",
            "id": "msg_682026fda6e88191be74ca810132bc7109a776b3e68b5a1d",
            "role": "assistant",
            "type": "message"
          }
        }
      ],
      "request": {
        "model": "o4-mini-2025-04-16",
        "temperature": "1",
        "top_p": "1"
      },
      "response": {
        "id": "resp_682026eb4d2c8191bf1f088e1e3dd9d609a776b3e68b5a1d",
        "model": "o4-mini-2025-04-16"
      },
      "usage": {
        "cache_read_input_tokens": "0",
        "completion_tokens": "2713",
        "prompt_tokens": "2705",
        "reasoning_tokens": "2624",
        "total_tokens": "5418"
      }
    },
    "instrumentation": {
      "name": "agentops",
      "version": "0.4.10"
    },
    "library": {
      "name": "openai",
      "version": "1.78.0"
    }
  },
  "event_timestamps": [],
  "event_names": [],
  "event_attributes": [],
  "link_trace_ids": [],
  "link_span_ids": [],
  "link_trace_states": [],
  "link_attributes": [],
  "metrics": {
    "total_tokens": 8042,
    "prompt_tokens": 2705,
    "completion_tokens": 2713,
    "cache_read_input_tokens": 0,
    "reasoning_tokens": 2624,
    "success_tokens": 8042,
    "fail_tokens": 0,
    "indeterminate_tokens": 0,
    "prompt_cost": "0.0029755",
    "completion_cost": "0.0119372",
    "total_cost": "0.0149127"
  }
}```

dot-agi · 2025-05-20T08:09:06Z

@the-praxs Another major issue here is that responses don't seem to be properly instrumented. Run any example OpenAI Agents and you'll see. Agents uses the Responses API, and for whatever reason, we don't catch prompts. Here's an example JSON we get in the AgentOps dashboard. Notice there's no prompt (user or system), only a completion.


{

  "span_id": "99f46cd24e9df5ea",

  "parent_span_id": "cdcfbaa2b5798726",

  "span_name": "openai.responses.create",

  "span_kind": "Client",

  "span_type": "request",

  "service_name": "agentops",

  "start_time": "2025-05-11T04:26:18.965974",

  "end_time": "2025-05-11T04:26:38.825378",

  "duration": 19859404000,

  "status_code": "OK",

  "status_message": "",

  "attributes": {},

  "resource_attributes": {

    "ProjectId": "890e4ebb-88f1-46cd-844c-c5a995b3eab7",

    "agentops.project.id": "890e4ebb-88f1-46cd-844c-c5a995b3eab7",

    "cpu.count": "10",

    "cpu.percent": "22.3",

    "host.machine": "arm64",

    "host.name": "f3edd163dca2",

    "host.node": "macbookpro.lan",

    "host.os_release": "24.0.0",

    "host.processor": "arm",

    "host.system": "Darwin",

    "host.version": "Darwin Kernel Version 24.0.0: Tue Sep 24 23:39:07 PDT 2024; root:xnu-11215.1.12~1/RELEASE_ARM64_T6000",

    "imported_libraries": "[\"openai\",\"json\",\"agentops\",\"asyncio\"]",

    "memory.available": "4799184896",

    "memory.percent": "72.1",

    "memory.total": "17179869184",

    "memory.used": "6705627136",

    "os.type": "linux",

    "service.name": "agentops"

  },

  "span_attributes": {

    "gen_ai": {

      "completion": [

        {

          "0": {

            "content": "I’m locking Tauros into Choice Band Close Combat.\n\nWith 225 Atk and 1.5× Choice Band boost, Close Combat’s 120 BP STAB + 2× against Ogerpon’s Grass/Fire typing will reliably OHKO without ever missing—whereas Stone Edge, while stronger on paper, is 80 accuracy and a miss would let Ogerpon fire back first.",

            "id": "rs_682026ebd87481919b5840f896eb180709a776b3e68b5a1d",

            "type": "output_text"

          }

        },

        {

          "1": {

            "finish_reason": "completed",

            "id": "msg_682026fda6e88191be74ca810132bc7109a776b3e68b5a1d",

            "role": "assistant",

            "type": "message"

          }

        }

      ],

      "request": {

        "model": "o4-mini-2025-04-16",

        "temperature": "1",

        "top_p": "1"

      },

      "response": {

        "id": "resp_682026eb4d2c8191bf1f088e1e3dd9d609a776b3e68b5a1d",

        "model": "o4-mini-2025-04-16"

      },

      "usage": {

        "cache_read_input_tokens": "0",

        "completion_tokens": "2713",

        "prompt_tokens": "2705",

        "reasoning_tokens": "2624",

        "total_tokens": "5418"

      }

    },

    "instrumentation": {

      "name": "agentops",

      "version": "0.4.10"

    },

    "library": {

      "name": "openai",

      "version": "1.78.0"

    }

  },

  "event_timestamps": [],

  "event_names": [],

  "event_attributes": [],

  "link_trace_ids": [],

  "link_span_ids": [],

  "link_trace_states": [],

  "link_attributes": [],

  "metrics": {

    "total_tokens": 8042,

    "prompt_tokens": 2705,

    "completion_tokens": 2713,

    "cache_read_input_tokens": 0,

    "reasoning_tokens": 2624,

    "success_tokens": 8042,

    "fail_tokens": 0,

    "indeterminate_tokens": 0,

    "prompt_cost": "0.0029755",

    "completion_cost": "0.0119372",

    "total_cost": "0.0149127"

  }

}```

I ran all the notebooks and checked the spans completely. I had both the prompts and completions in the LLM calls.

I cannot reproduce this issue but let me see if there's something wonky.

dot-agi · 2025-05-20T08:54:21Z

Cannot reproduce the issue you mentioned. Here are the trace IDs for each of the notebooks and they have the data as intended -

Web search example: b3b938ad5b00922f67313155ce2ecd5c
Customer service: 810722bcf718821cd8f0043747906ce2
Agent workflow: 691151a51c4e9e78a6cd13d55f44cdb1

dot-agi · 2025-05-21T17:41:37Z

Closing this since #987 makes a core change.

dot-agi added 2 commits May 17, 2025 03:48

pass context of agent llm spans to openai llm spans

627b614

refactor

486c140

dot-agi requested review from Dwij1704, areibman and Copilot May 16, 2025 22:40

Copilot AI reviewed May 16, 2025

View reviewed changes

dot-agi added 2 commits May 17, 2025 04:45

refactor to fix tests

a18fe1d

add test for custom wrappers

5f87535

dot-agi requested a review from Copilot May 16, 2025 23:16

Copilot AI reviewed May 16, 2025

View reviewed changes

Dwij1704 and others added 2 commits May 18, 2025 22:56

Merge branch 'main' into fix/duplicate-agents-llm-calls

edb7de9

new fix to duplicate llms and test

6951219

dot-agi requested a review from Copilot May 19, 2025 19:39

Copilot AI reviewed May 19, 2025

View reviewed changes

tests/unit/instrumentation/openai_core/test_instrumentor.py Show resolved Hide resolved

dot-agi added 2 commits May 20, 2025 14:11

resolved tools not getting recorded properly

be9812c

fix test for attributes

43b2850

dot-agi added 2 commits May 21, 2025 11:32

Merge branch 'main' into fix/duplicate-agents-llm-calls

6ca4095

Merge branch 'main' into fix/duplicate-agents-llm-calls

a197956

dot-agi closed this May 21, 2025

dot-agi deleted the fix/duplicate-agents-llm-calls branch May 21, 2025 17:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Duplicate LLM calls for the Agents SDK #984

fix: Duplicate LLM calls for the Agents SDK #984

Uh oh!

dot-agi commented May 16, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

codecov bot commented May 16, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

areibman commented May 17, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

areibman commented May 20, 2025

Uh oh!

dot-agi commented May 20, 2025

Uh oh!

areibman commented May 20, 2025

Uh oh!

dot-agi commented May 20, 2025

Uh oh!

dot-agi commented May 20, 2025

Uh oh!

dot-agi commented May 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: Duplicate LLM calls for the Agents SDK #984

fix: Duplicate LLM calls for the Agents SDK #984

Uh oh!

Conversation

dot-agi commented May 16, 2025

📥 Pull Request

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

codecov bot commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

areibman commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

areibman commented May 20, 2025

Uh oh!

dot-agi commented May 20, 2025

Uh oh!

areibman commented May 20, 2025

Uh oh!

dot-agi commented May 20, 2025

Uh oh!

dot-agi commented May 20, 2025

Uh oh!

dot-agi commented May 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented May 16, 2025 •

edited

Loading

areibman commented May 17, 2025 •

edited

Loading